1 Introduction

filled out by Daniel



2 Methodology

partly filled out by Daniel;


Data preparation

Searches with volume 0 are removed from the data set. For ahref analysis, only searches with volume > 100 are looked at. Analyses are based on US location.



Data enrichment

~2.5 million searches were enriched with ahref. This includes the statistics difficulty, return rate, clicks, region volume, and SERP features.



Overview of the data

Overview
Statistic Value
Total number of searches ~306 million
Total volume of searches ~303 billion
Searches with missing volume 0.51%
Mean search volume 989
Median search volume 10
Mean CPC 0.61



3 Research Findings


Questions in searches

~14% of searches are in the form of a question. “how” is the most common question word



Stopwords

“how” and “the” are the most common stopwords, which are present in 6-8% of searches.



Keyword length

The most searched queries have length 6-9 characters, and falls continuously for search queries longer or shorter than that.



Keyword info categories

Internet & Telecom is the keyword category with the highest mean volume

Arts & Entertainment, Internet & Telecom, and News, Media & Publications have the highest total volume

Finance has the highest mean cost per click



Keyword difficulty

As volume increases, the difficulty increases.

From linear regression we find that for each doubling of the volume, the difficulty increases by 1.63


Difficulty and CPC are also correlated:



spell types

Most of the searches with the highest volumes are attempts to go to a popular website.

Searches with highest volume
keyword location spell spell_type keyword_info_search_volume
jou tube 2840 youtube showing_results_for 1.85e+08
youtube the 2840 1.85e+08
youi tue 2840 youtube showing_results_for 1.85e+08
acerook 2840 1.85e+08
youetube 2840 1.85e+08
you tbut 2840 youtube did_you_mean 1.85e+08
ykutube 2840 youtube showing_results_for 1.85e+08
uotod 2840 youtube showing_results_for 1.85e+08
utuen 2840 youtube did_you_mean 1.85e+08
ytu tube 2840 1.85e+08


As a result about half of all volume has a spell type. Although only ~1.4% of searches have a spell type.

Top 10 intended searches that are misspelled
spell volume site
youtube 35.3% youtube
facebook 8.7% facebook
amazon 7.6% amazon
google 6.3% google
weather 2.2% weather
translate 1.6% translate
com 1.5% com
instagram 1.3% instagram
walmart 1.3% walmart
ebay 1.2% ebay



Search volume

The top 2000 searches have extremely high volume, while the vast majority of the rest of the searches are very low volume.

Note that many of these extremely high volume searches are not a search for something as such, but an attempt to go to one of the popular sites above.



SERP features

(Note there are (at least) two additional SERP feature types (Knowledge Panel and Videos), for which the sample size is too small to include.)

The SERP features featured in the most searches are Image pack and People also ask:

The knowledge card has a huge effect in reducing the clicks-per-search, while the other SERP features have limited effect. Searches with the Shopping results SERP feature have higher cps on average.

Easy keywords have fewer SERP features

Thumbnail & Top stories is the most common SERP feature pairing

Searches without SERP features tend to be low volume

Searches with more SERP features have higher mean difficulty



Return rate

We can see that searches with high return rates tend to have lower difficulty, and to be clicked on a lot more.

Comparison of searches with same volume but different return rates
return_rate mean_cpc mean_clicks mean_difficulty
very high 0.96 71423 18.4
low 0.70 15094 25.6



International searches

International searches have overall higher volume

## # A tibble: 2 x 2
##   region        volume
##   <chr>         <chr> 
## 1 US            33%   
## 2 International 67%

[Note, this doesn’t concur with the volume in the main data set. Not sure what is going on here.]

Internationally there are more searches with very low volume, while US has more searches with medium volume.

There is not a large difference in the number of searches with very high volume. However, the total volume of these searches is a lot higher internationally

Searches that have high US volume tend to have high international volume, and vice versa. But there are some exceptions.

Searches that have much higher volume internationally
keyword us_volume international_volume
filmoviplex 10 295990
cloroquina 200 5869800
parivahan sewa 10 276990
jokaroom 10 173990
handball em 20 327980
Searches that have much higher volume in the US
keyword us_volume international_volume
football playoff schedule 602000 1000
frontier mail 586000 1000
spectrum mobile 526000 1000
chase bank near me 523000 1000
spectrum internet 998000 2000

Searches that have higher volume in US have a higher click-per-search on average than searches that have higher volume internationally.

They also have a higher cost-per-click on average

Searches that have higher volume internationally, tend to have higher difficulty